Harden NexusTokens for polymorphism with internal whitespace; add NexusTokensToInteger() by ms609 · Pull Request #269 · ms609/TreeTools

ms609 · 2026-05-18T12:03:44Z

Summary

Regression coverage for Cingulata-style polymorphism tokens with internal whitespace (e.g. (1 2), {0 1}, including continuation lines). No parser fix was needed: gsub(\" \", \"\", ...) at R/parse_files.R:88 already strips internal whitespace before NexusTokens() sees the tokens, so (1 2) becomes (12). The new test in tests/testthat/test-parsers.R locks that behaviour down.
New exported helper NexusTokensToInteger() (R/parse_files.R:887): converts the character matrix from ReadCharacters() (or a vector from NexusTokens(), or a phyDat object) to an integer matrix.
- polymorphism = c(\"?\", \"first\", \"last\") — default \"?\" treats polymorphic tokens like NEXUS missing data (→ NA_integer_). \"first\"/\"last\" take the corresponding state digit inside the brackets.
- phyDat input is routed through PhyDatToMatrix(ambigNA = TRUE, inappNA = TRUE) so fully-ambiguous and inapplicable rows become NA_integer_ and only true partial polymorphism is subject to the polymorphism rule.
- Only digit states 0..9 are recognised; non-digit symbols (and any token whose interior contains no digits) map to NA_integer_. Documented in @details.
Bug fix in the polymorphism extractor. The earlier draft used regmatches(x[ambig], regexpr(...)), which silently dropped no-match elements and crashed (or recycled wrongly) on tokens like \"(AB)\" or \"()\". Replaced with a length-preserving pattern that pads via the m != -1L mask. Regression test added.
TNT test cleanup — drops a handful of dead TNT fixtures and test-ReadTntTree.R test cases that no longer exercise live code paths after the parser simplification on this branch.
Roxygen 8 redoc of all man/*.Rd files.
Version bumped to 2.3.0.9000 (development) with a NEWS entry.

Test plan

devtools::document() regenerates NAMESPACE (export added at line 375) and man/NexusTokensToInteger.Rd.
devtools::test(filter = \"parsers\") — all parser tests pass, including the new Cingulata block, the round-trip from ReadCharacters() into NexusTokensToInteger(), the no-digit polymorphism regression, the named-vector input case, attribute preservation for state.labels, and the phyDat → integer routing.
Full devtools::test() — no regressions. Skipped tests are pre-existing slow-only.
Reviewer: please run R CMD check locally if you'd like full CRAN-grade assurance before merging.

🤖 Generated with Claude Code

…tespace No parser fix was needed: the existing gsub(" ", "", ...) at parse_files.R:88 already strips internal whitespace from polymorphism tokens before NexusTokens() sees them, so (1 2) -> (12) and {0 1} -> {01} already worked correctly. Added: - Regression test covering (1 2) / {0 1} polymorphism and multi-line matrix continuation in test-parsers.R. - NexusTokensToInteger(): a new exported helper that converts the character matrix from ReadCharacters() to integer, mapping polymorphic/ambiguous/?/- tokens to NA_integer_ by default, or extracting the first/last state digit under polymorphism = "first"/"last". Tests included. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…espace

Resolve conflicts from PR #268 (TNT parser fixes) landing on main while this branch was open: - DESCRIPTION: bump to 2.3.0.9002 (past main's 9001); drop duplicate Config/roxygen2/version line introduced by the merge. - NEWS.md: consolidate dev entries from both sides under the new header. - inst/extdata/tests/tnt-*.tnt, tests/testthat/test-ReadTntTree.R: keep main's versions; the branch's earlier deletions were premature. - man/*.Rd: regenerated via devtools::document() so @family references include both NexusTokensToInteger() (this branch) and the TNT additions from main. R/parse_files.R auto-merged cleanly. Full devtools::test() green.

github-actions · 2026-05-18T12:21:17Z

⚠️ This benchmark result is outdated. See the latest comment below.

Performance benchmark results

Call	Status	Change	Time (ms)
`as.Splits(bigTrees)`	⚪ NSD	-0.55%	30.2 → 31.1, 30.3
`as.Splits(someTrees)`	⚪ NSD	-0.76%	9.93 → 9.98, 10
`Consensus(forest1k.888, check = FALSE)`	⚪ NSD	0.73%	88.2 → 87.6, 87.9
`Consensus(forest201.80, check = FALSE)`	⚪ NSD	-0.69%	4.02 → 4.02, 4.07
`Consensus(forest21.260, 0.5, FALSE)`	⚪ NSD	0.38%	1.13 → 1.12, 1.13
`Consensus(forest21.260)`	⚪ NSD	0.42%	1.13 → 1.12, 1.12
`Consensus(forestMaj, 0.5, FALSE)`	⚪ NSD	0.45%	2.88 → 2.85, 2.89
`DropTip(tr2000, 5)`	⚪ NSD	-0.52%	26 → 26.3, 25.9
`DropTip(tr80, 5)`	⚪ NSD	0.02%	0.0893 → 0.0896, 0.089
`DropTip(unlen2k, 5)`	⚪ NSD	-0.38%	0.191 → 0.191, 0.192
`DropTip(unlen80, 5)`	⚪ NSD	-0.15%	0.034 → 0.0341, 0.0341
`lapply(bigSplits, as.phylo)`	⚪ NSD	0.66%	28.4 → 27.5, 28.3
`lapply(someSplits, as.phylo)`	⚪ NSD	0.32%	11.6 → 11.5, 11.7
`PathLengths(tr2000, full = TRUE)`	⚪ NSD	1.29%	25.2 → 25.1, 24.5
`PathLengths(tr80, full = TRUE)`	⚪ NSD	1.96%	0.0782 → 0.0764, 0.0768
`PathLengths(tr80Unif, full = TRUE)`	⚪ NSD	2.64%	0.0802 → 0.0779, 0.0782
`RootTree(tr2000, 5)`	⚪ NSD	0%	0.3 → 0.299, 0.301
`RootTree(tr80, c("t3", "t36"))`	⚪ NSD	-0.01%	0.0574 → 0.0572, 0.0576
`RootTree(tr80, "t3")`	⚪ NSD	0.03%	0.0405 → 0.0404, 0.0405
`RootTree(tr80, "t30")`	⚪ NSD	-0.11%	0.0405 → 0.0405, 0.0405
`RootTree(unlen2k, 5)`	⚪ NSD	-0.34%	0.287 → 0.286, 0.29
`RootTree(unlen80, c("t3", "t36"))`	⚪ NSD	-0.1%	0.0529 → 0.0529, 0.053
`RootTree(unlen80, "t3")`	⚪ NSD	0.05%	0.0356 → 0.0355, 0.0356
`RootTree(unlen80, "t30")`	⚪ NSD	-0.47%	0.0356 → 0.0358, 0.0358
`TreeDist::RobinsonFoulds(forest201.80)`	⚪ NSD	0.29%	16.6 → 16.4, 16.6
`TreeDist::RobinsonFoulds(forest21.888)`	⚪ NSD	0.28%	3.13 → 3.11, 3.14
`TreeTools:::path_lengths(tr80$edge, tr80$edge.length, FALSE)`	⚪ NSD	2.05%	0.0695 → 0.068, 0.0682
`TreeTools:::postorder_order(bal40)`	⚪ NSD	0.88%	0.00148 → 0.00147, 0.00147
`TreeTools:::postorder_order(bal40k)`	⚪ NSD	0.12%	0.314 → 0.313, 0.315
`TreeTools:::postorder_order(dbal40)`	⚪ NSD	-1.45%	0.00165 → 0.00166, 0.00168
`TreeTools:::postorder_order(dbal40k)`	⚪ NSD	-0.14%	2.18 → 2.18, 2.19
`TreeTools:::postorder_order(dpec40)`	⚪ NSD	-1.37%	0.00242 → 0.00245, 0.00246
`TreeTools:::postorder_order(dpec40k)`	⚪ NSD	0.09%	4440 → 4440, 4430
`TreeTools:::postorder_order(drnd80)`	⚪ NSD	-1%	0.00389 → 0.00392, 0.00394
`TreeTools:::postorder_order(nbal40)`	⚪ NSD	-0.59%	0.00201 → 0.00202, 0.00203
`TreeTools:::postorder_order(nbal40k)`	⚪ NSD	-0.17%	2.43 → 2.43, 2.44
`TreeTools:::postorder_order(npec40)`	⚪ NSD	0.49%	0.00277 → 0.00277, 0.00275
`TreeTools:::postorder_order(npec40k)`	⚪ NSD	-0.3%	4430 → 4440, 4450
`TreeTools:::postorder_order(nrnd80)`	⚪ NSD	-0.25%	0.00464 → 0.00462, 0.00471
`TreeTools:::postorder_order(pec40)`	⚪ NSD	-0.54%	0.00148 → 0.00147, 0.00149
`TreeTools:::postorder_order(pec40k)`	⚪ NSD	-10.62%	0.314 → 0.314, 0.441
`TreeTools:::postorder_order(rnd80)`	⚪ NSD	1.02%	0.00176 → 0.00173, 0.00174

github-actions · 2026-05-18T12:28:20Z

Performance benchmark results

Call	Status	Change	Time (ms)
`as.Splits(bigTrees)`	⚪ NSD	-2.25%	22.3 → 22.9, 22.3
`as.Splits(someTrees)`	⚪ NSD	-0.84%	11.3 → 11.2, 11.4
`Consensus(forest1k.888, check = FALSE)`	⚪ NSD	-2.19%	101 → 99.8, 108
`Consensus(forest201.80, check = FALSE)`	⚪ NSD	-1.01%	4.1 → 4.06, 4.19
`Consensus(forest21.260, 0.5, FALSE)`	⚪ NSD	-1.3%	1.24 → 1.24, 1.26
`Consensus(forest21.260)`	⚪ NSD	-0.83%	1.26 → 1.26, 1.28
`Consensus(forestMaj, 0.5, FALSE)`	⚪ NSD	-0.12%	2.99 → 2.94, 3.02
`DropTip(tr2000, 5)`	⚪ NSD	-0.01%	17.4 → 17.3, 17.6
`DropTip(tr80, 5)`	⚪ NSD	4.25%	0.111 → 0.107, 0.106
`DropTip(unlen2k, 5)`	⚪ NSD	-1.38%	0.208 → 0.211, 0.211
`DropTip(unlen80, 5)`	⚪ NSD	0.56%	0.041 → 0.0411, 0.0405
`lapply(bigSplits, as.phylo)`	⚪ NSD	0.3%	29.5 → 29.2, 29.8
`lapply(someSplits, as.phylo)`	🟣 ~same	3%	14.1 → 13.5, 13.7
`PathLengths(tr2000, full = TRUE)`	🟣 ~same	2.92%	16.9 → 16.4, 16.5
`PathLengths(tr80, full = TRUE)`	⚪ NSD	0.9%	0.106 → 0.105, 0.105
`PathLengths(tr80Unif, full = TRUE)`	⚪ NSD	1.11%	0.109 → 0.107, 0.107
`RootTree(tr2000, 5)`	⚪ NSD	-0.35%	0.399 → 0.402, 0.399
`RootTree(tr80, c("t3", "t36"))`	⚪ NSD	0.65%	0.0722 → 0.0719, 0.0714
`RootTree(tr80, "t3")`	⚪ NSD	1.45%	0.0516 → 0.0512, 0.0504
`RootTree(tr80, "t30")`	⚪ NSD	1.12%	0.0517 → 0.0513, 0.0508
`RootTree(unlen2k, 5)`	⚪ NSD	-3.54%	0.331 → 0.328, 0.349
`RootTree(unlen80, c("t3", "t36"))`	⚪ NSD	0.48%	0.0672 → 0.067, 0.0667
`RootTree(unlen80, "t3")`	⚪ NSD	0%	0.0443 → 0.0442, 0.0445
`RootTree(unlen80, "t30")`	⚪ NSD	0.49%	0.045 → 0.0445, 0.045
`TreeDist::RobinsonFoulds(forest201.80)`	⚪ NSD	2.67%	16.8 → 16.2, 16.6
`TreeDist::RobinsonFoulds(forest21.888)`	⚪ NSD	-0.15%	3.47 → 3.47, 3.49
`TreeTools:::path_lengths(tr80$edge, tr80$edge.length, FALSE)`	⚪ NSD	1.92%	0.0961 → 0.0943, 0.0943
`TreeTools:::postorder_order(bal40)`	⚪ NSD	1.16%	0.00173 → 0.0017, 0.00172
`TreeTools:::postorder_order(bal40k)`	⚪ NSD	0.49%	0.435 → 0.431, 0.434
`TreeTools:::postorder_order(dbal40)`	⚪ NSD	1.61%	0.0018 → 0.00177, 0.00177
`TreeTools:::postorder_order(dbal40k)`	⚪ NSD	-0.29%	2.03 → 2.02, 2.04
`TreeTools:::postorder_order(dpec40)`	⚪ NSD	1.16%	0.0026 → 0.00256, 0.00257
`TreeTools:::postorder_order(dpec40k)`	⚪ NSD	0.16%	3300 → 3300, 3300
`TreeTools:::postorder_order(drnd80)`	⚪ NSD	0.02%	0.00409 → 0.00408, 0.00409
`TreeTools:::postorder_order(nbal40)`	⚪ NSD	-0.47%	0.00211 → 0.00212, 0.00211
`TreeTools:::postorder_order(nbal40k)`	⚪ NSD	-0.15%	2.2 → 2.2, 2.2
`TreeTools:::postorder_order(npec40)`	⚪ NSD	0.72%	0.00291 → 0.00289, 0.00289
`TreeTools:::postorder_order(npec40k)`	🟣 ~same	0.38%	3330 → 3310, 3310
`TreeTools:::postorder_order(nrnd80)`	⚪ NSD	1.31%	0.00466 → 0.00461, 0.00458
`TreeTools:::postorder_order(pec40)`	⚪ NSD	1.74%	0.00172 → 0.00169, 0.00168
`TreeTools:::postorder_order(pec40k)`	⚪ NSD	-0.28%	0.431 → 0.431, 0.433
`TreeTools:::postorder_order(rnd80)`	⚪ NSD	-0.46%	0.00214 → 0.00215, 0.00215

codecov · 2026-05-18T12:28:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.07%. Comparing base (ee67ba8) to head (db1238f).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #269   +/-   ##
=======================================
  Coverage   96.07%   96.07%           
=======================================
  Files          80       80           
  Lines        5905     5968   +63     
=======================================
+ Hits         5673     5734   +61     
- Misses        232      234    +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ms609 and others added 6 commits May 18, 2026 10:31

redoc w/ roxygen8

5f3c742

Complete impl

a2b1039

Merge remote-tracking branch 'origin/main' into fix-polymorphism-whit…

1182ffa

…espace

sp

db1238f

ms609 merged commit 05036bb into main May 18, 2026
33 of 34 checks passed

ms609 deleted the fix-polymorphism-whitespace branch May 18, 2026 12:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden NexusTokens for polymorphism with internal whitespace; add NexusTokensToInteger()#269

Harden NexusTokens for polymorphism with internal whitespace; add NexusTokensToInteger()#269
ms609 merged 6 commits into
mainfrom
fix-polymorphism-whitespace

ms609 commented May 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

codecov Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ms609 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance benchmark results

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026

Performance benchmark results

Uh oh!

codecov Bot commented May 18, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ms609 commented May 18, 2026 •

edited

Loading

github-actions Bot commented May 18, 2026 •

edited

Loading